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Abstract 

A  dynamic  communication  network  is  one  in  which  links  may  repeatedly  fail  and  recover.  In  such 
a  network,  though  it  is  impossible  to  establish  a  path  of  unfailed  links,  reliable  communication  is 
possible,  if  there  is  no  cut  of  permanently  failed  links  between  a  sender  and  receiver. 

We  consider  the  basic  task  of  of  end-to-end  communication,  that  is,  delivery  in  finite  time,  of  data 
items  generated  on-line  at  the  sender,  to  the  receiver,  in  order  and  without  duplication  or  omission. 

The  best  known  previous  solutions  to  this  problem  had  exponential  complexity.  Moreover,  it  was 
conjectured  in  [AG88]  that  a  polynomial  solution  is  impossible. 

This  paper  disproves  this  conjecture,  presenting  the  first  polynomial  end-to-end  protocol.  The 
protocol  uses  methods  adopted  from  shared  memory  algorithms,  and  introduces  novel  techniques  for 
fast  load  balancing  in  communication  networks. 
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1  Introduction 


A  basic  task  in  any  network  is  that  of  end-to-end  communication,  that  is,  delivery  in  finite  time,  of  data 
items  generated  at  a  designated  sender  processor,  to  a  designated  receiver  processor,  without  duplication, 
omission  or  re-ordering  of  the  data  items.  The  data  items  could  represent  transactions  of  a  stock  exchange, 
speech  or  video  signals,  military  commands,  etc.  In  almost  all  cases,  data  items  are  generated  on-line 
and  are  not  available  at  the  beginning  of  the  protocol’s  execution. 

In  a  reliable  network,  where  communication  links  never  fail,  this  task  is  performed  easily  by  estab¬ 
lishing  a  fixed  communication  path  between  the  sender  and  the  receiver,  and  sending  all  data  items  along 
this  path.  Unfortunately,  existing  communication  networks,  e.g.  the  ARPANET  [MRR80],  DECNET 
[Wec80],  have  a  dynamic  topology'  in  the  sense  that  links  may  repeatedly  fail  and  recover,  making  it 
impossible  to  relay  on  the  use  of  any  single  communication  path. 

The  “classical”  approach  to  handle  the  problem  is  to  construct  a  new  communication  path  every 
time  the  previous  path  fails,  taking  care  to  purge  any  messages  in  transit  on  the  old  path.  However, 
this  approach  is  very  limited,  since  its  implementations  (as  in  [Fin79,  Gal76,  AAG87,  AS88,  AAM89, 
AGH89])  require  strong  assumptions  regarding  the  allowable  patterns  of  link  failures  in  the  network.  In 
[AAG87,  AS88,  AAM89]  for  example,  the  assumption  is  that  the  whole  network  stabilizes  for  a  period  of 
time  long  enough  to  allow  construction  of  a  path  and  communication  over  it.  The  weakest  assumption 
among  those  above,  presented  in  the  broadcast  protocol  of  [AGH89],  still  requires  that  all  the  edges  on 
some  path  between  the  sender  and  receiver  be  operational  for  the  entire  time  period  required  to  construct 
that  path  and  communicate  the  data  over  it. 

This  assumption  is  overly  optimistic,  since  for  example,  if  every  edge  has  a  constant  probability  of 
being  operational  (or  not  operational)  at  a  given  time,  then  the  probability  of  the  whole  path  being 
operational  at  a  given  time  is  exponentially  small  in  the  length  of  the  path. 

However,  one  can  see  that  the  existence  of  an  operational  communication  path  is  not  a  necessary 
condition  for  communicating  between  two  nodes  (processors).  In  fact,  as  stated  in  [AE86],  the  necessary 
condition  is  merely  that  there  is  “eventual  connectivity”  between  the  sender  and  receiver,  in  the  sense 
that  there  is  no  permanent  cut  (see  [Vis83])  of  failed  edges  between  them.  More  precisely,  there  exists 
no  partition  of  the  network  into  two  sets,  one  containing  the  sender  and  the  other  the  receiver,  such  that 
from  some  time  and  on,  no  operating  edge  connects  a  node  in  one  set  to  a  node  the  other  set. 

Early  works  [Vis83,  AE83,  AE86]  solving  the  end-to-end  problem  under  the  minimal  conditions  alone, 
were  based  on  the  use  “unbounded  sequence  numbers”,  implying  that  both  the  message  size  and  amount 
of  memory  needed,  grow  with  number  of  data  items  transmitted.  In  other  words,  the  complexity  of  the 
protocols  was  unbounded  in  terms  of  the  true  input  to  the  problem,  namely,  the  size  of  the  network. 

The  elegant  and  surprising  work  of  AfeV  and  a.,-..  [AG88]  presented  the  first  “bounded  complexity” 
end-to-end  protocol.  Unfortunately,  this  solution  r  ed  exponential  message  complexity  (because  the 
number  of  times  a  given  data  item  is  sent  over  a  net  .  jrk  link  is  exponential).  Moreover,  it  was  believed 
by  many  researchers,  and  conjectured  in  [AG88],  that  ro  polynomial  solution  exists,  leaving  little  hope 
for  a  reasonable  solution  to  the  end-to-end  problem. 

In  this  paper,  we  disprove  this  conjecture,  presenting  an  end-to-end  protocol  that  is  polynomial  in  both 
messages  and  space.  The  protocol  is  based  on  a  new  technique  for  sequence  numbering,  that  combines 
the  sequential  time-stamp  schemes  used  in  shared  memory  algorithms  ([Lam86,  IL87,  DS89]),  with  a 
novel  and  highly  fault  tolerant  load  balancing  method  allowing  to  preserve  global  properties  based  on 
local  information  only  (as  in  Goldberg  Tarjan  [GT88]  ar  1  Ahuja-Orhn  [A087]). 
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2  Problem  Statement 

2.1  The  network  model 

Consider  a  communication  network  in  the  form  of  an  undirected  graph  G(V,  E),  where  the  nodes  are  the 
processors  and  the  edges  are  the  links  of  communication.  Each  undirected  link  consists  of  two  directed 
links,  delivering  messages  in  the  opposite  directions.  Below  we  describe  the  properties  of  a  directed  link. 

Each  link  has  a  finite  capacity,  in  the  sense  that  only  constant  number  of  messages  is  allowed  to 
be  in  transit  on  a  given  link  at  a  given  time  (I.e.  we  consider  only  protocols  that  obey  this  property.) 
The  communication  over  links  obeys  the  FIFO  rule,  that  the  sequence  of  messages  receded  over  the 
link  is  a  prefix  of  the  sequence  of  messages  sent  over  the  link.  Also,  the  communication  is  completely 
asynchronous ,  namely,  there  is  no  a  priori  bound  on  link  delays. 

A  link  is  non-viable  if  starting  from  some  message,  and  on,  it  will  not  deliver  any  further  messages 
to  the  other  end-point;  for  those  messages  the  delay  is  considered  to  be  infinite  (oc).  The  sequence  of 
messages  received  is  in  this  case  a  proper  prefix  of  the  sequence  of  messages  sent.  Otherwise,  the  link  is 
viable.  An  undirected  link  is  viable  if  both  of  the  directed  links  that  it  consists  of  are  viable. 

We  say  that  the  sender  is  eventually  connected  to  the  receiver  if  there  exists  a  (simple)  path  from  the 
sender  to  the  receiver,  consisting  entirely  of  viable  links.  Clearly,  if  non-viable  links  create  a  cut  of  the 
network,  disconnecting  the  sender  from  the  receiver,  then,  eventually,  the  sender  will  not  be  capable  of 
delivering  messages  to  the  receiver. 

2.2  The  end-to-end  problem 

The  purpose  of  the  end-to-end  protocol  is  to  establish  a  (directed)  “virtual  link”  to  be  used  for  delivery 
of  data  items  from  the  sender  to  the  receiver.  It  is  required  that  this  virtual  link  be  viable  if  and  only 
if  the  sender  is  eventually  connected  to  receiver.  This  virtual  link  should  have  the  same  properties  as  a 
“regular”  network  link,  namely: 

Safety:  The  sequence  of  data  items  output  by  the  receiver  is  a  prefix  of  the  sequence  of  data  items  input 
by  the  sender. 

Liveness:  If  the  sender  is  eventually  connected  to  the  receiver,  then  each  data  item  input  by  the  sender 
is  eventually  output  by  the  receiver. 

2.3  The  complexity  measures 

We  consider  the  following  complexity  measures. 

Communication:  The  number  of  bits  transferred  in  the  network,  per  data  item  delivered.  That  is,  this 
the  total  number  of  bits  sent  in  the  period  of  time  between  two  successive  data  item  deliveries  at 
the  receiver. 

Space:  The  maximal  amount  of  space  required  by  a  node’s  program  throughout  the  protocol. 

Time:  The  maximal  length  of  a  time  interval  between  two  successive  data  item  deliveries  at  the  receiver, 
under  the  assumption  that  delays  of  viable  links  are  upper-bounded  by  1  time  unit. 

Computation  time:  The  maximal  number  of  local  computation  steps  a  node  performs  in  the  interval 
of  time  between  two  successive  data  item  deliveries,  provided  that  this  data  item  is  not  the  last 
data  item. 
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Definition  2.1  A  protocol  is  bounded  if  its  communication,  space,  time,  and  computation  time  complexities 
are  independent  of  the  number  of  data  items,  depending  only  on  the  size  of  the  network. 

Definition  2.2  A  protocol  is  polynomial  if  its  communication,  space,  time  and  computation  time  complex¬ 
ities  are  upper-bounded  by  polynomials  of  the  size  of  the  network. 

We  would  like  to  stress  the  fact  that  being  able  to  send  (receive)  an  infinite  number  of  messages 
does  not  require  either  sender  (receiver)  to  have  infinite  space.  A  single  buffer  at  the  sender  (receiver) 
suffices  in  order  to  store  next  data  item  to  be  transmitted.  The  precise  formulation  of  this  “interactive” 
statement  of  the  problem  can  be  found  in  [LMF88]. 

2.4  Relations  to  other  models 

The  model  described  above  is  called  the  “oo-delay  model”  in  [AG88],  and  the  “fail-stop  model”  in  [AM88]. 
As  mentioned  in  the  introduction,  our  motivation  here  is  to  deal  with  networks  frequently  changing 
topology.  In  such  dynamic  networks,  links  may  fail  and  recover  many  times  (yet  processors  never  fail). 
Each  failure  or  recovery  of  network  link  is  eventually  reported  at  both  end-points  by  some  underlying  link 
protocol.  As  pointed  out  by  Afek  and  Gafni  [AG88],  this  dynamic  model  is  easily  reducible  to  the  model 
described  above.  The  simulation  of  the  dynamic  model  by  the  fail-stop  model  is  as  follows.  A  message  to 
be  forwarded  on  a  link  is  stored  in  a  buffer,  until  the  link  recovers  and  all  the  previously  sent  messages 
have  been  delivered.  A  protocol  similar  to  the  data- link  initialization  protocol  of  Baratz  and  Segall  [BS88] 
is  used  to  guarantee  that  no  messages  are  lost  or  duplicated.  Each  link  in  the  dynamic  network  that  does 
(does  not)  fail  forever  is  represented  by  a  viable  (non-viable)  link  in  the  fail-stop  model.  Any  two  nodes 
eventually  connected  in  the  dynamic  network  are  eventually  connected  in  the  fail-stop  network. 

3  Informal  Description 

In  this  section  an  informal  outline  of  the  protocol  and  the  main  ideas  leading  to  it  are  presented.  The 
presentation  begins  with  a  description  of  a  very  simple  end-to-end  protocol  using  unbounded  sequence 
numbers ,  i.e,  messages  of  unbounded  size.  This  algorithm  is  then  refined  through  a  series  of  steps,  to 
derive  a  bounded  protocol  having  polynomial  complexity. 

The  simple  unbounded  protocol  involves  two  types  of  messages.  The  “data”  message  contains  some 
data  item  which  is  to  be  delivered  to  the  receiver;  the  “acknowledgment”  (in  short,  “ack”)  message  serves 
to  acknowledge  the  receipt  of  the  data  item  at  the  receiver.  Both  the  data  and  the  ack  messages  carry 
the  (unbounded)  sequence  number  of  the  data  item  in  question;  the  data  message  also  carries  the  data 
item  itself. 

The  protocol  works  as  follows.  Once  the  sender  inputs  the  data  item  of  sequence  number  i ,  it  sends  the 
data  message  (indexed  with  V)  to  all  its  neighbors.  Every  node,  upon  receiving  this  message,  forwards  it 
to  all  its  neighbors,  unless  it  already  received  a  data  message  with  higher  sequence  number.  The  receiver, 
upon  receiving  the  data  message  of  sequence  number  l ,  sends  back  an  ack  message  with  sequence  number 
l.  The  sender  will  input  the  next  data  item  (with  sequence  number  i  +  1)  only  after  it  received  the  ack 
for  the  data  item  l.  Note  that  the  protocol  creates  a  situation  where  many  messages  may  be  in  transit 
on  the  same  link  at  the  same  time. 

The  first  very  simple  modification  is  intended  to  guarantee  that  at  a  given  time,  there  is  at  most  one 
message  in  transit  in  a  given  direction  on  a  link.  This  is  achieved  by  letting  every  process  send  the  ack 
of  l  on  every  edge  the  data  message  with  l  was  received.  A  process  does  not  send  a  future  data  message 
on  an  edge,  before  it  receives  the  ack  for  the  last  data  message  sent  on  that  edge.  Though  this  only 
means  that  messages  are  stored  in  the  process  rather  than  on  the  channel,  the  number  of  messages  stored 
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per  edge  can  be  reduced  to  one,  by  noticing  that  it  suffices  to  maintain  only  the  message  with  highest 
sequence  number.  Observe  however,  that  the  complexity  is  still  unbounded,  since  the  size  of  the  sequence 
numbers  i  is  not  bounded. 

At  this  point,  the  following  important  observation  is  made.  Although  the  protocol  uses  an  unbounded 
number  of  different  label  values,  at  each  point  in  time,  the  number  of  different  label  values  in  the  system  is 
linear  in  the  number  of  edges.  Let  it  first  be  shown  how,  assuming  that  the  sender  has  a  label  oracle  telling 
it  whether  a  given  t  value  exists  in  the  system,  a  polynomial  end-to-end  protocol  using  only  bounded  size 
labels  can  be  designed  (this  protocol  is  called  the  main  protocol).  The  label  oracle  enables  the  sender  to 
compute  the  set  of  values  that  exist  in  the  system,  denoted  by  t. 

The  idea  is  that  the  sender,  in  order  to  send  a  new  data  item,  generates  a  label  that  is  “greater”  than 
all  the  labels  in  £.  A  well  known  mechanism  that  achieves  this  goal  is  a  bounded  sequential  time  stamp 
system  (see  [IL87,  DS87].)  The  time  stamp  system  to  be  used  will  be  of  size  N ,  where  N  is  polynomial 
in  n,  and  each  label  has  size  of  O(N)  bits.  Such  a  system  guarantees  that  for  any  set  of  values  t,  that 
has  less  than  N  values,  a  label  that  is  not  in  l  and  is  “greater”  than  any  value  that  is  in  l  can  be  found. 
Note  that  the  system  also  defines  the  operation  “greater”  associated  with  the  labels.  Using  such  a  time 
stamp  system  and  the  oracle,  the  sender  can  always  find  a  new  label,  that  is  greater  than  all  the  labels 
in  the  system,  and  is  of  size  polynomial  in  n. 

Most  of  the  effort  in  the  protocol  is  implementing  a  mechanism  similar  to  the  above  label  oracle.  One 
would  like  a  node  to  “know”  locally  that  it  is  clean  of  a  given  label  l ,  that  is,  all  references  to  it  can  be 
eliminated.  To  achieve  this  two  modifications  are  introduced.  First,  the  sending  of  replies  to  a  message 
with  label  l  is  restricted  to  edges  that  were  traversed  by  the  message  i.  This  implies  that  a  node  can 
receive  a  reply  to  message  l  only  on  edges  on  which  message  l  has  been  sent.  Second,  a  node  does  not 
send  a  new  data  message  on  an  edge  until  it  receives  a  reply  to  the  previous  message  sent  along  it. 

Assume  that  node  v  receives  a  message  with  label  l  and  forwards  it.  assume  also  taht  after  some  time 
it  received  a  reply  for  t  on  all  the  links  on  which  it  forwarded  the  message,  and  also  received  a  message 
labeled  later  than  l.  At  this  point  none  of  the  local  variables  in  v  need  contain  reference  to  label  l.  Thus, 
node  v  creates  a  token  that  includes  its  name  v,  the  label  l  and  the  list  of  edges  that  the  message  t  was 
sent  on.  The  token  is  an  indication  that  node  v  is  clear  of  references  to  label  l. 

The  remaining  unresolved  problem  is  that  of  getting  all  the  tokens  with  label  l  to  the  sender,  so  it 
can  deduce  that  l  is  not  in  the  system,  since  all  nodes  were  clean  when  the  tokens  were  created,  and  all 
tokens  are  in  the  sender.  If  the  sender  collects  the  tokens  from  some  of  the  nodes,  but  not  all  of  them, 
it  can  check  locally  if  it  has  all  the  tokens  created  for  a  given  t  in  the  following  way.  Using  the  lists  of 
edges  (listing  on  which  edges  message  t  was  sent)  in  the  tokens  of  label  l ,  the  sender  can  create  a  set  5 
of  nodes  that  certainly  received  the  message  l.  If  from  every  node  in  5,  the  sender  has  received  a  token, 
then  there  is  no  node  in  5  that  sent  message  l  to  a  node  not  in  S.  Since  the  sender  is  in  S,  nodes  not  in 
S  never  received  message  l. 

The  problem  in  getting  all  the  tokens  to  the  sender,  is  that  on  one  hand,  some  of  the  edges  that  the 
tokens  are  sent  on  may  fail.  This  problem  may  be  solved  by  duplicating  tokens  and  sending  them  on 
different  paths.  However,  duplicating  tokens  disables  the  sender  from  checking  locally  if  it  has  all  the 
tokens  of  i  that  were  created. 

Assume  one  could  bound  the  token  capacity  of  the  network  so  that  each  process  apart  from  the  sender 
(whose  capacity  is  unbounded)  could  contain  at  most  some  fixed  number  of  tokens.  After  the  network’s 
token  capacity  is  reached,  the  creation  of  a  new  token  in  any  process  would  imply  that  some  token  was 
received  by  the  sender.  By  simple  pigeon  hole  arguments  it  can  be  shown  that  after  a  bounded  number 
of  such  token  creations,  there  would  be  a  message  t  for  which  all  the  tokens  were  received  by  the  sender. 

The  solution  to  the  token  collection  problem  is  thus  a  fault  tolerant  load  balancing  protocol  to  assure 
that  tokens  are  evenly  distributed  among  processes  in  the  system,  maintaining  the  property  that  every 
process  apart  from  the  sender  has  bounded  capacity.  The  protocol  assures  that  no  matter  which  com- 
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munication  path  becomes  eventually  connected,  there  are  sufficiently  many  tokens  on  this  path,  and  one 
will  eventually  be  forwarded  to  the  sender. 

The  basic  idea  behind  the  load  balancing  protocol  is  the  following.  Each  node  has  some  quantity 
of  tokens.  Assume  for  a  moment  that  the  network  is  synchronous  and  static.  Consider  the  following 
protocol.  At  every  even  clock  tick  a  node  sends  a  token  to  each  neighbor  if  it  had  A  or  less  tokens  than 
it,  and  sends  nothing  otherwise.  At  every  odd  clock  tick,  each  node  updates  its  neighbors  about  the 
number  of  tokens  it  has.  After  a  polynomial  number  of  iterations  the  protocol  will,  given  that  A  =  fi(n), 
converge  to  a  steady  state  (i.e.  no  tokens  are  sent).  The  surprising  fact  is  that  a  very  similar  protocol 
will  converge,  and  with  polynomial  communication,  in  an  asynchronous  network  where  links  may  fail. 

In  the  protocol,  the  number  of  tokens  stored  in  a  node  is  bounded  by  a  polynomial  in  n.  In  order 
to  enforce  this  bound,  a  node  that  has  more  than  a  certain  amount  of  tokens  is  blocked ,  and  does  not 
respond  to  messages  of  the  main  protocol.  This  guarantees  that  a  node  will  not  generate  additional 
tokens  locally.  Furthermore,  it  can  be  proven  that  this  rule  does  not  cause  deadlock.  The  bound  on  the 
number  of  tokens  in  each  node  implies  that  the  number  of  tokens  that  can  be  in  the  network  at  any  point 
in  time  is  polynomial  in  n  (which  is  less  than  N).  Any  label  that  is  sent,  for  which  not  all  the  tokens 
have  been  collected,  is  assumed  to  exist  in  the  system.  The  value  of  N  will  be  chosen  to  be  more  than 
the  “label  capacity”  of  the  network,  and  thus  the  sender  will  always  be  able  to  generate  a  new  label. 

The  formal  proof  of  the  load  balancing  protocol  uses  amortized  analysis  to  show  that  though  it  could 
be  that  some  given  token  cycles  forever  in  the  network,  the  total  number  of  tokens  sent  in  the  period  of 
time  between  two  successive  message  receipts  at  the  receiver,  is  bounded  by  a  polynomial  in  n.  In  the  rest 
of  this  section  we  sketch  an  intuitive  argument  for  the  complexity  of  the  load  balancing  protocol.  Note 
that  every  token  that  is  sent  creates  at  most  2 n  updates,  therefore  ,  to  bound  the  complexity,  it  suffices 
to  bound  the  number  of  token  messages  sent.  As  long  as  no  new  token  is  created,  and  no  token  is  received 
by  the  sender,  the  number  of  all  the  tokens  in  the  network  remains  unchanged.  As  mentioned  before,  the 
aim  of  the  protocol  is  to  distribute  the  tokens  evenly  between  the  nodes.  Consider  an  energy  function  £ 
that  is  the  square  of  number  of  tokens  in  each  node  at  a  given  time.  Clearly,  this  function  achieves  its 
minimum,  when  the  tokens  are  evenly  distributed,  that  is,  when  there  is  not  enough  energy  for  a  token  to 
be  sent,  because  no  two  processes  have  a  token  difference  of  A.  In  a  static  and  synchronous  system,  each 
token  sent  from  a  process  to  one  with  A  less  tokens,  reduces  £  by  at  least  n,  for  A  >  2n.  Unfortunately 
in  our  case,  due  to  the  asynchrony,  updates  might  be  delayed,  and  based  on  outdated  information,  there 
may  be  “bad”  tokens  whose  receipt  will  increase  £  since  they  were  mistakenly  sent  to  processes  having 
more  tokens.  However,  in  order  for  such  a  “bad”  token  send  to  occur,  updates  of  many  tokens  must  be 
delayed.  The  property  that  can  be  shown,  and  is  crucial  to  the  complexity  analysis,  is  that  in  order  to 
create  the  many  delayed  updates  necessary  for  one  “bad”  token  to  be  sent,  many  “good”  token  sends 
must  occur,  and  so  it  cannot  be  that  “bad”  tokens  are  continuously  providing  the  energy  for  more  “bad” 
tokens  to  be  sent. 

4  The  Main  Protocol 

4.1  Preliminaries 

In  the  following  subsections,  the  code  of  the  main  protocol  meeting  the  desired  end-to-end  properties  is 
described. 

In  the  presentation  of  the  code,  we  use  the  language  of  guarded  commands  of  Dijkstra  [DF88],  where 
a  process  code  of  the  form  G i  — *■  A\OGi  — *  — ►  Ak  is  repeatedly  executed.  In  each 

execution,  of  all  the  guards  Gi  that  are  true,  an  arbitrary  i  is  selected  and  A,  is  performed.  A  guard 
Gi  is  a  conjunction  of  predicates.  The  “receive  M  on  e”  guard  is  true  if  message  M  is  available  in 
the  “incoming  messages”  buffer  of  channel  e.  The  execution  of  the  corresponding  statement  includes  the 
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receipt  of  the  message,  and  the  deletion  of  the  message  from  the  buffer. 

To  simplify  the  description  of  the  properties  and  proofs,  global  time  is  assumed.  The  execution  of 
each  guarded  command  in  the  code  of  a  process  is  thus  termed  an  event,  and  is  assumed  to  be  atomic. 
The  state  of  the  system  at  any  time  consists  of  the  local  process  states  5j,r  £  V,  and  channel  states 
Cg,e  €  E,  as  they  were  following  the  latest  event  in  every  process.  The  subscripts  and  superscripts, 
added  to  variables  (var{,),  denote  the  local  process  state. 

4.2  Properties  of  the  main  protocol 

Let  the  input  sequence  I  =  (Do,  D\, ...)  be  an  infinite  sequence  of  data  items  to  be  input,  one  after 
the  other,  to  the  sender.  Similarly,  let  Ol  =  (Do,  D\, ...)  be  the  sequence  of  data  items  output  by  the 
receiver  in  all  output  events  preceding  some  state  S‘.  Then  the  following  properties  must  be  met  by  an 
End-to-End  communication  protocol. 

PI  Safety  In  any  state  5‘,  the  output  sequence  0l  is  a  prefix  of  I. 

P2  Liveness  For  each  data  element  Dk  in  I,  there  is  a  state  St  in  which  it  is  added  to  0‘. 

One  can  easily  see  that  Pl-2  are  equivalent  to  the  definition  in  Section  2.2. 

4.3  Creating  a  Virtual  Network 

In  presenting  the  protocols  below,  it  is  assumed  that  both  sender  and  receiver  have  a  single  link  that 
is  always  viable,  connecting  them  to  the  rest  of  the  network.  This  assumption  is  made  with  no  loss  of 
generality,  since  one  can  effectively  “split”  the  sender  (or  the  receiver)  node  into  two  virtual  parts,  a 
“special”  sender  node  responsible  for  the  input  (output)  interface,  and  an  “ordinary”  node,  responsible 
for  the  interface  with  the  rest  of  the  network  nodes.  Thus,  all  the  network  nodes  can  be  partitioned  in  3 
categories:  sender,  receiver,  and  the  “ordinary”  nodes,  where  the  sender  and  receiver  are  each  connected 
to  one  “ordinary”  node.  While  all  the  ordinary  nodes  perform  the  same  protocol,  special  protocols  are 
designed  for  the  sender  and  the  receiver. 

In  the  main  protocol  presented  below,  both  sender  and  receiver  are  split  into  special  and  normal  nodes. 
In  the  label  protocol  presented  in  the  sequel,  only  the  sender  is  split. 

4.4  A  sequential  time-stamp  system 

The  algorithm  used  to  generate  the  labels  added  to  the  data  items  transmitted  by  the  main  protocol,  is 
a  sequential  time  stamp  system  algorithm. 

A  sequential  time  stamp  system  consists  of  a  set  of  labels  l  —  {l  \l  £  L},  |f|  <  TV  for  some  constant 
TV,  and  a  labeling  function  C  (().  The  label  values  in  the  range  L  are  ordered  by  the  irreflexive  and 
antisymmetric  relation  -<,  described  in  terms  of  a  precedence  graph  G  =  (L,<).  If  the  cardinality  of  L  is 
bounded,  the  time  stamp  system  is  said  to  be  bounded,  i.e.  labels  are  of  a  bounded  size.  The  labeling 
function  C  :  LN~l  *-*•  L,  given  a  set  of  TV  —  1  labels  totally  ordered  by  returns  a  new  label  t,  greater 
by  the  order  <  than  all  TV  -  1  others.  A  more  elaborate  description  of  the  properties,  upper  and  lower 
bounds  of  sequential  time-stamp  system  constructions,  due  to  Israeli  and  Li,  can  be  found  in  [IL87]. 

The  sequential  time-stamp  system  to  be  used  in  our  construction  is  a  variant  due  to  [DS87],  of  a 
construction  by  Lamport  [Lam86],  Let  the  range  L  of  labels  (nodes)  in  the  precedence  graph  G  be  of  size 
\L\  =  TV  •  2n .  The  label  value  l  of  each  node  is  thus  a  boolean  vector  of  size  TV  +  log  A.  Let  log  TV  bits 
of  the  vector  l  be  a  “cluster  number”  (denoted  C(l)),  C(l)  =  i[(N  -  1) ...  (TV-  1  +  logTV)]  £  {0. .TV  —  1} . 
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Procedure  C(l); 

t[(N-l)..(N-l+logN)\  :=  (»  :  W  €  t,  C(t')  ±  i); 

for  all  t  €  i  do 

if  C{t)  >  C(t')  then  f[C(f')]  :=  ^[C(f)]  fi; 
if  C(t)  <  C(t')  then  t[C(t)\  :=  not  e'[C{t)]  fi; 

od; 

return  t\ 
end  C ; 


Figure  1:  The  labeling  function 


receive  REPLY  (t)  — * 
trying  :=  false ; 

0 

freeJabeLavailable(l);  trying  :=  false  ;  — ► 
Input  value; 
t:=C(l); 
t.=  fu{f}; 

send  MSG  (t,  value) 
trying  :=  true 


Figure  2:  Code  of  the  main  protocol:  sender 


Let  the  remaining  N  bits  /[»],<  €  {0..N - 1},  identify  a  node  in  the  cluster.  The  following  is  thus  the 
definition  of  the  relation  -<,  where  l'  ■<  l  if  there  is  a  directed  edge  from  the  node  of  l  to  that  of  l'  in  G. 


t!  = 


true 

false 


if(C(f')<c(f))A(f[c(f')]  =  ^[ci(0]) 
or  ( C(t')  >  C{t))  A  (l[C(l') }  *  l'[C{t))) 
otherwise. 


That  is,  labels  of  nodes  in  the  same  cluster  are  unrelated,  and  nodes  in  different  clusters  are  always 
related.  Figure  1  is  the  definition  of  the  labeling  function  C. 

Note  that  for  the  sake  of  simplicity,  the  bit  exists  in  every  label  l,  though  it  is  never  set 

(all  nodes  with  either  setting  of  this  bit  are  equally  usable).  Proof  that  the  above  construction  has  the 
properties  of  a  sequential  time-stamp  system,  can  be  found  in  [DS87]. 

The  sequential  time  stamp  system  used  in  the  protocol  has  a  label  set  of  size  N  =  1  +  (A  +  5 )n2. 
The  predicate  function  freeJabeLavailable[l\,  that  indicates  whether  there  is  a  free  label  value  that  can 
be  used,  is  |f|  <  N. 


4.5  Sender’s  protocol 

The  code  of  the  protocol  is  presented  in  Figure  2. 

The  labeling  protocol  maintains  the  set  1  of  labels  which  are  believed  to  be  existing  in  the  system. 
The  boolean  function  free^labeLavailable  (1)  returns  value  true  only  if  the  labeling  function  C  can  be 
applied  to  return  a  new  label.  The  sender  also  maintains  boolean  variable  trying,  which  is  true  if  the 
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receive  MSG  (l,  value)  on  e  — ► 
output  value ; 
send  REPLY  (()  on  e; 


Figure  3:  Code  of  the  main  protocol,  receiver 


sender  is  in  the  process  of  delivering  next  message  to  the  receiver,  and  a  variable  value,  which  is  the 
value  (contents)  of  the  current  data  item  to  be  delivered. 

The  sender  reads  the  input  into  its  variable  value  when  both  trying  =  false  (current  data  item  has 
been  delivered)  and  free-labeLavailable  (£)  =  true.  At  that  time  the  sender  computes  a  new  label  as  t  := 
£(£).  It  then  transmits  the  information  message  MSG (t,  value)  over  its  (only)  outgoing  link. 

4.6  Receiver’s  protocol 

The  receiver’s  protocol  is  given  in  Figure  3. 

For  every  MSG  (£,  value)  received,  receiver  outputs  the  contents  value  of  the  message  and  sends  back 
REPLY  (f). 

4.7  Ordinary  node’s  protocol 

The  code  itself  is  given  in  Figure  4. 

The  operations  of  this  protocol  are  performed  only  while  a  node  is  not  “blocked”  by  the  label  protocol, 
a  condition  that  is  determined  based  on  the  variable  blocked.  For  that  purpose,  the  node  maintains 
boolean  variable  blocked,  which  is  true  if  the  node  is  blocked.  In  the  sequel,  we  describe  operations 
performed  at  the  node  while  it  is  not  blocked. 

An  important  property  of  the  protocol  is  that  when  the  sender  sends  a  message  MSG(/,  value),  in  every 
node  in  the  network  the  variables  that  depend  on  the  value  l  are  at  their  initial  value,  in  other  words, 
there  is  no  already  existing  reference  to  this  label  in  the  system. 

The  node  maintains  variables  latest./  -  the  label  of  the  latest  message  received,  and  latest.value 
-  the  value  of  the  data  item  sent  in  of  the  latest  message.  Also,  it  maintains  a  number  of  arrays,  each 
indexed  by  label  values.  We  use  arrays  with  entries  l  in  the  code,  for  the  sake  of  simplicity  only.  To 
achieve  polynomial  space,  the  actual  implementation  of  these  arrays  would  be  in  the  form  of  a  space 
efficient  data  structure  such  as  a  linked  list. 

The  variable  recjnsg[f]  (rec_reply[f])  is  a  boolean  array,  whose  ltK  entry  is  true  if  MSG(£,  value) 
( REPLY) f))  has  been  was  received,  but  whose  token  with  label  t  has  not  been  generated  yet.  The  following 
arrays,  edges_Fent-mag,  edges_rec_msg,  edges -sent  .reply,  and  edgesjrecjreply,  are  indexed  by  l. 
Each  entry  is  a  set  of  edges  on  which  a  MSG(f ,  value)  or  a  REPLY(f)  message  has  been  sent  or  received. 
Also,  for  each  edge  e,  we  define  a  variable  status  (ej,  receiving  values  clean  or  dirty.  If  status  [e]  =  dirty 
then  there  exits  l,  such  that  e  6  edges _sent_msg[f],  but,  at  the  same  time,  e  £  edges jrecjreply[f]. 

There  is  a  simple  update  rule  for  the  variable  status.  Once  a  message  is  sent  on  e,  the  sending  node 
sets  status  [e]  :=  dirty.  Upon  receiving  a  REPLY,  status  [e]  :=  clean  is  performed. 

When  a  node  receives  MSG  (i, value)  on  edge  e,  it  acts  as  follows.  It  adds  e  to  edges _rec_msg[f], 
and  then  checks  whether  latest-l  <  l.  If  so,  then  the  message  is  a  new  one;  in  this  case  it  updates 
the  variables  associated  with  the  latest  message,  setting  latest J  :=  l,  latest.value  :=  value,  and 
rec_msg[^]  :=  true. 
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->  blocked;  receive  MSG  ((, value)  on  e - 

edg«a_r«c _*sg [£]  :=  edgss.r«c-msg  [C[  U  {e}; 
if  latast  J  -<  t  then 
latast  JL  :=  t; 
lataat.valua  :=  value-, 
racjasg[f]  :=  true ; 

for  all  i1  if  £'  ■<  £  A  racjnag[f'j  then  rac_reply[£']  :=  true  fi 

□ 

-■blocked  ;  -urac .reply  [f]  ;  racjnsgf^]  ;  status  [e]  =  clean;  e  £  edge8_sent_msg  [latest./]  — * 
send  HSG(latast_£,latest_value)  on  e; 
status  [e]  :=  dirty ; 

adgas.sant  jasg  [latest  J!]  :=  edges  jantjnsg  [latest _£]  U  {e}; 

□ 

->  blocked;  receive  REPLY  (l)  on  e  — » 
status  [e]  :=  clean; 

edges _xec_reply  [P[  :=  edges _rec_reply  [f]  U  {e}; 

for  al»  £',  if  £'  £  A  rec.jnsgff']  then  racjreply[f']  :=  true  fi;  /*this  includes  £  itself  */ 

□ 

-■blocked  ;  rec_reply[f]  ;  e  €  edges_rec_nsr  [f]  ;  e  £  edges _sent_reply  [f]  — ► 
send  REPLY  (£)  on  e; 

edges  j«nt -reply  [f]  :=  edges -sent  -reply  [C\  U  {e}; 

O 

-■blocked;  edgesjsent _msg  [£\  —  edges_rec_reply  [£]  ^  0;  £  -<  latest-f  — ► 

call  Procedure  New.Token  (v,  t,  edges-sent-asg  [f] );  /*  triggers  the  label  protocol;*/ 

edges_sent  jssg  [f]  :=  0;  edge s_s ant -reply  [f\  :=  0;  rec_reply[£]  :=  false; 
edges-rec-asg  [f]  :=  0;  edges _rec_reply  [(\  :=  0;  rec_*sg[f]  :=  false; 


Figure  4:  Code  of  the  main  protocol 


The  receipt  of  a  message  MSG (l,  value)  implies  the  receiving  can  send  a  REPLY  for  all  labels  l'  <  l, 
since  a  reply  to  all  smaller  messages  must  have  already  been  received  by  the  sender.  To  this  end,  the  node 
sets  rec_reply[f']  :=  true  for  all  £'  <  l,  for  which  recjnsg[f']  =  true,  i.e.  a  message  has  been  received. 

The  latest  message  is  then  forwarded  on  each  edge  e  at  most  once,  if  the  edge  status  is  clean,  and 
no  REPLY  has  been  received  fo  that  message.  It  is  clear  why  a  message  is  forwarded  only  once,  and  the 
reason  for  not  forwarding  after  a  REPLY  message  was  received  is  simply  because  the  receiver  has  already 
received  this  message. 

Whenever  REPLY  (l)  is  received  on  e,  the  node  sets  status  [e]  :=  clean ,  adds  e  to  edgas_rec_reply  [f], 
and  sets  rec-reply^'j  :=  true  for  all  tf  ■<  l,  for  which  recjnsg[f']  =  true.  Note  that  a  message 
MSG(f ,  value )  serves  as  a  REPLY  only  to  labels  V  ■<  l ,  while  a  REPLY(f)  is  also  a  reply  for  the  label  t. 
The  reason  for  this  difference  s  quite  obvious:  in  this  case  node  can  deduce  that  label,  too  has  already 
arrived  at  the  receiver. 

If  rec_reply  [f]  =  true,  the  node  will  forward  the  reply  labeled  l  on  each  edge  e,  provided  MSG  was  re¬ 
ceived  on  that  edge  (e  €  edges-recjnsg  [f]),  and  this  is  the  first  reply  on  this  edge  e  £  edges_sent_reply  |f] 

Following  the  receipt  of  MSG(f',vo/ue),  with  £  ■<  £',  and  the  receipt  of  all  replies  to  the  message,  there 
is  no  reference  to  this  label  in  the  given  node.  At  this  time,  the  node  generates  a  new  “token”  with  label 
l,  to  indicate  that  it  and  all  its  outgoing  channels  are  clear  of  this  label.  The  token  contains  the  name  of 
the  node,  the  label  t  and  the  list  of  edges  edges_sent  jnsg  [f],  on  which  MSG(f,va/ue)  has  been  sent.  Tins 
serves  as  input  to  the  label  protocol. 
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receive  T0KE.i(node,£, edges)  on  e  — » 

send  T0KE1JICK  on  e; 

clsan_nodes[f]  :=  clnanjiodes[f]  U  {node}; 
used_«dg«s[l]  :=  us«d.«dgas[f]  U  {edges) 

if  V  (u  — » tv)  £  u»«d_adg*«[f],  w  €  cl«an_node«[f]  then  £  :=  £  —  {£}  fi; 

/*  £  is  dead  */ 

Figure  5:  Code  of  the  label  protocol  at  the  sender 


5  Label  protocol 

5.1  Properties  of  the  label  protocol 

The  correctness  of  the  main  protocol  depends  on  the  properties  of  the  label  protocol,  providing  new 
unused  label  values  that  can  be  added  to  the  data  items  transmitted.  The  main  interface  between  the 
main  protocol  and  label  protocol  is  as  follows.  There  exists  of  a  set  of  labels  in  the  range  L,  and  a  relation 
<  among  them.  There  is  a  predicate  function  free.labeLavailable (( ),  indicating  that  the  labeling  function 
£  (I)  can  be  executed  correctly,  returning  a  new  label  value  to  be  used.  The  labeling  function  £  :  L*  L 
returns  a  label  l  6  L,  and  adds  this  label  to  i.  The  label  protocol  is  allowed  to  block  (by  changing  the 
value  of  the  variable  blocked)  the  progress  of  the  main  protocol  in  any  given  process,  in  order  to  maintain 
the  property  of  having  a  free  label  available  to  the  sender  always.  It  will  suffice  that  in  any  system  state, 
the  label  protocol  will  have  the  following  properties1: 

Qi  comparability:  In  any  state  if  a  label  l  exists  in  a  process  state  S\  or  channel  state  C‘,  then  l  €  1 
in  S\. 

Q2  ordering:  The  labels  in  l  are  totally  ordered  by  -<,  where  if  freeJabeLavailable  (1 )  holds,  for  any  label 
V  €  l,  it  is  the  case  that  l'  X  £  (f). 

Q3  availability:  In  any  state  5‘,  freeJabeLavailable (/)  holds. 

Q4  non- blocking:  Let  B*  be  the  set  of  nodes  for  which  in  any  time  t'  >  t,  blocked  =  true.  There  is  no 
time  t",  such  that  B t"  forms  a  cut  between  the  sender  and  the  receiver. 

The  above  properties  formalize  the  idea  that:  1.  the  order  -<  indicates  the  order  in  which  labels  were 
generated,  2.  all  label  values  in  the  system  are  totally  ordered  by  -<,  3.  a  free  label  is  always  m  ailable, 
and  4.  the  nodes  that  are  blocked  by  the  label  protocol,  never  form  a  cut  between  the  sender  and  receiver. 

As  mentioned  earlier,  in  the  label  protocol,  only  the  sender  is  split  into  a  special  node  and  normal 
node.  All  other  nodes,  including  the  receiver,  axe  treated  as  normal  nodes. 

5.2  Sender’s  protocol 

The  sende:’1  otocol  is  given  in  Figure  5. 

The  p  f  *  of  this  protocol  is  to  determine  which  labels  are  no  longer  in  use  in  the  system.  For  each 
label  t,  the  send  ~  tries  to  establish  the  nodes  and  edges  having  a  variable  or  message  of  this  label  in  the 
network.  Tht  .able  used_edges[f]  denotes  the  set  of  edges  traversed  by  messages  MSG  (l, value),  and 
variable  ;lean_nodes[f]  denotes  all  the  nodes  which  will  not  send  label  l  any  more. 

'The  exact  definition  ot  exist i  as  in  Ql  is  given  in  Definitions  6.1  and  6.2. 
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As  explained  in  the  previous  section,  a  token  labeled  l  and  generated  by  a  given  node,  contains  the 
set  of  edges  on  which  MSG  (f, value)  has  been  forwarded  by  this  node. 

In  general,  whenever  the  sender  receives  T0KEN( node, (, edges)  message,  it  adds  node  to  clean _nodes[^], 
and  adds  edges  to  used_edgea[/].  Whenever  for  each  edge  (u,  w)  6  used_edges[f],  both  u  and  w  exist  in 
clean_nodes[^j,  the  sender  deduces  that  label  l  does  not  exist  in  the  network  and  thus  deletes  it  from  I. 

5.3  Ordinary  node’s  protocol 

The  ordinary  node’s  protocol  is  presented  in  Figure  7. 


Procedure  New.Token  (node,  t,  edges) 
tokvns  :=  tokena+l; 
if  tokens  =  n  ■  A  then  blocked  :=  true  A; 
token_set  :=  tokenjiet  U  {  (node,  £,  edges)); 
for  all  e  £  (v,  s)  do 

unreport ed[e]  :=  unreportedfe]  +  1  od  fl; 
end  New.Token; 


Procedure  Update  (e) 

send  UPDATE  (unreported[e])  on  e: 
vait.update.ack  [e]  :=  true; 
unreportedfe]  :=  0 
end  Update; 


Figure  6:  The  procedures 


The  ordinary  nodes  coordinate  the  delivery  of  tokens  to  the  sender.  The  variable  token_set  denotes 
the  set  of  tokens  which  have  accumulated  at  this  node,  and  the  variable  tokens  denotes  the  cardinality 
of  this  set.  A  node  updates  its  neighbors  about  the  change  in  tokens.  In  order  not  to  have  many  UPDATE 
messages  in  transit  at  the  same  time  on  a  single  edge  e,  the  node  accumulates  the  net  change  between 
two  UPDATE  messages  in  the  local  variable  unreported[e].  In  the  proof  we  will  claim  that  at  any  time  t, 
the  sum  of  estimateu[v],  unraported^v,  u)],  and  the  value  in  the  UPDATE  message  in  transit  from  v  to 
u  (if  one  exists)  is  equal  to  tokens,,. 

In  general,  whenever  TOKEH(node,l,edges)  arrives  at  a  node  on  edge  e,  the  node  adds  it  to  token_set, 
sets  tokens  :=  tokens  4-  1,  and  sends  a  TOKEN JICK  message  back,  which  acknowledges  receipt  of  TOKEN 
message.  Also,  for  all  edges  e,  the  node  increments  unreported[e]  by  1.  For  each  adjacent  edge  e,  a  node 
maintains  a  boolean  variable  valt.tokan.ack  [e]  which  receives  value  false  after  TOKEN  message  was  sent 
on  e  and  before  TOKEN JICK  is  received. 

The  label  protocol  has  a  parameter  A  =  112n.  Each  time  the  cardinality  of  the  token  set  exceeds  a 
certain  constant  n  •  A,  the  node  sets  blocked  :=  true,  disabling  the  operation  of  the  main  protocol.  This 
effectively  bounds  the  number  of  tokens  in  a  node. 

The  purpose  of  the  algorithm  is  to  push  all  the  tokens  towards  the  sender.  The  tokens  are  carried 
by  messages  TOKEN.  When  such  message  is  received,  TOKEN _ACK  message  is  sent  back  to  acknowledge  its 
arrival. 

The  updating  on  edge  e  is  performed  by  sending  message  UPDATE  (unreported[e])  on  that  edge.  Re¬ 
ceipt  of  this  message  is  acknowledged  by  a  special  message  UPDATEJtCK .  For  each  edge  e,  node  maintains 
boolean  flag  wait.update.ack  [e],  which  receives  value  false  after  sending  UPDATE  one  e  and  before  receiv¬ 
ing  UPDATE-ACK  from  e.  The  UPDATE  (unreported[e])  message  is  sent  on  e  whenever  wait_update_ack  [e] 
=  false  and  unreportedfe]  ^  0.  At  this  time,  unreported[e]  :=  0  is  set.  Each  node  keeps  for  each  edge 
u  an  estimate  estimate  [u]  for  the  variable  tokens  on  the  other  end-point  of  that  edge.  Whenever  node 
receives  UPDATE  (a;)  message  on  edge  e,  it  adds  i  to  estimate  [u],  and  sends  back  UPDATEJICK . 

As  it  is  not  clear  which  path  to  the  sender  is  operational,  the  algorithm  simply  tries  to  balance  the 
tokens  more  or  less  evenly  between  the  nodes.  That  is,  a  node  tries  to  push  tokens  to  its  neighbors 
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receive  TQKEM(node,t, edges)  on  e  — *■ 

if  -iwait.updat«_ack(e)  A  unr«ported[e]  /  0  then  call  Procedure  Update  (e)  fl 

send  TOKElJkCK  on  e;  /*  ack  the  message  */  ; 

call  Procedure  New.Token  (node,t,  edges) 

□ 

->w*it_updat«_ack(e);  unreported[e]  /  0  - — ► 
call  Procedure  Update  (e) 

□ 

tokens  -  estimate  [u]  >  A;  ->  wait_token_ack(e);  (unreport ed[e]  =  0)  V  wait.update.ack;  — ► 

some.token  :=  select-token  (tokenjset);  /^select  an  arbitrary  token*/ 

send  TOKEl(soae.token )  on  e; 

wait_token_ack  [e]  :=  true] 

token-set  :=  token  jet  —  {some.token}; 

tokens  :=  tokens  —  1; 

if  tokens  <  n  •  A  then  blocked  :=  false  ii; 

for  all  e  gt  (v,  s)  do  unreport edfe'j  :=  unreport ed[e']  —  1  od; 

□ 

receive  UPDATE  (i)  on  e  — ♦ 

estimate  [u]  :=  estimate  [u]+z; 
send  UPDATEJICK  on  e; 

□ 

receive  TOKEN _ACK  on  e  — ► 
sait .token _ack  [e]  :=  false ; 

□ 

receive  UPDATE _ACX  on  e  — ► 
wait  .update  _ack  [e]  :=  false ; 

Figure  7:  Code  of  the  label  protocol 


along  any  any  edge  u,  such  tokens  -  estimate  [u]  >  A,  i.e.  the  amount  of  tokens  on  the  other  side  is 
estimated  to  be  less  than  amount  of  tokens  at  the  node  by  at  least  A.  However,  the  actual  transmission 
is  postponed  until  no  TOKEH  or  TOKEN _ACK  message  is  pending  on  the  link,  namely  wait_token_ack  (e)  = 
false. 

One  of  subtleties  of  the  protocol,  is  that  upon  arrival  of  a  TOKEN  a  node  must  send  an  UPDATE  message 
if  there  is  one  to  be  sent,  before  it  sends  the  TOKEN _ACK.  This  is  intended  to  achieve  the  same  effect  as 
the  update  phases  in  the  synchronous  version  of  the  algorithm  informally  described  before.  Failure  to  do 
so  would  allow  the  actual  token  distribution  to  differ  significantly  from  the  one  known  to  the  processors, 
and  would  would  result  in  an  exponential  increase  in  complexity. 

6  Correctness  of  the  Main  Protocol 

In  this  section,  it  is  proven  that  under  the  assumption  that  the  label  protocol  is  correct,  that  is,  has 
properties  Ql-4,  the  main  protocol  meets  properties  Pl-2.  Then,  in  Section  7,  the  proof  is  completed 
by  proving  that  the  label  protocol  meets  properties  Ql-4 .  Finally,  in  Section  8,  the  complexity  of  the 
complete  End- to- End  protocol  is  shown  to  be  polynomial. 

Recall  again  that  as  a  convention  we  superscript  variables  by  time  and  subscript  them  by  the  node 
they  belong  to,  e.g.  tokens^,  is  the  value  of  the  local  variable  tokens  of  node  v  at  time  t  (i.e.  at  Si). 
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Definition  0.1  If  in  a  processor  state  5* ,  there  is  an  entry  for  label  /  in  one  of  the  variables  edges_recjnsg, 
edges _sent_msg,  edges  _sent_reply,  edges _rec_reply,  rec_reply,  or  recjasg,  or  latest./  =  /,  then  / 
is  said  to  exist  in  node  v  in  state  5*. 

Definition  0.2  If  in  a  channel  state  <7‘,  a  message  MSG(Z,  value)  or  REPLY(Z)  is  in  transit,  then  /  is  said  to 
exist  in  the  channel  e  in  state  5 *. 

Lemma  0.3  Let  to  be  a  time  at  which  the  sender  added  /  to  /,  and  the  earliest  time  after  at  which  the 
sender  deleted  /  from  Z.  On  each  edge  e  =  ( u,v ),  the  message  MSG(Z,  value)  was  sent  at  most  once  between 
time  to  and  fa. 

Proof:  Assume  by  way  of  contradiction  that  it  was  sent  twice  at  times  fa  and  <4,  where  fa  <  <4. 
The  processor  u  sending  at  time  *3  the  MSG(Z,  value)  on  edge  e,  had  latest  JL  =  Z,  and  added  e  to 
edges  .sent  _msg[/].  In  order  to  have  sent  the  message  at  time  fa,  u  must  not  have  e  in  edges_sent _msg[Z] 
in  Si*.  The  edge  e  could  have  been  deleted  from  edges.sent jasg[/]  only  by  this  variable  being  set  to  0. 
Together  with  the  setting  of  edges.sent  jnsg[Z]  to  0,  rec _msg[Z]  must  have  been  set  to  false.  By  the  code, 
the  setting  could  have  been  done  only  after  latest./  was  set  to  a  label  /',  Z  -<  /'.  By  Ql-2,  at  time  to,  Z 
was  greater  than  all  labels  in  5‘,  and  since  Z  remains  in  Z  till  time  ti,  any  new  label  /'  added  at  time  t , 
fa  <  t  <  fa,  is  greater  than  any  of  the  labels  in  S*. 

We  claim  that  latest./  could  not  have  been  less  than  /  during  [<3,  <4].  Indeed,  consider  the  first 
time  t,  fa  <  t  <  <4  latest./  =  /-</;  moreover  assume  that  /  has  been  the  previous  value  of  latest  J. 
Clearly,  Z  -<  Z.  Assume  labels  Z,  /  were  generated  at  the  sender  at  times  t,  and  t,  respectively.  It  must 
be  that  t  £  [<o»*i]»  since  in  this  interval,  l  £  l,  and  it  would  be  the  case  that  Z  -<  Z,  a  contradiction. 
Consequently,  t  <  fa  and  Z  €  S*°. 

Since  at  time  fa,  latest  J  =  Z,  and  by  the  fact  that  t  is  the  first  time  since  fa  that  latest./  -<  /, 
it  follows  that  /  <  /.  Therefore,  t  £  Since  /  6  Sto,  and  could  not  be  generated  in  [<0,  <1],  by 

definition,  /  £  S*,  and  therefore  it  is  in  /  at  time  t  (by  Ql).  By  Q2,  at  time  t,  l  y  l',  for  all  /'  £  t.  In 
particular  /  >-  /,  since  /  £  l  at  time  t. 

Thus,  rec_msg[Z]  could  not  have  been  set  to  true  once  again  before  time  *4,  a  contradiction  to  the  fact 
that  processor  ti  sent  MSG(Z,  value)  on  e  at  time  □ 

Theorem  0.4  If  the  labeling  protocol  has  properties  Ql-2,  then  the  main  protocol  has  property  PI. 

Proof:  Assume  by  way  of  contradiction  that  the  above  does  not  hold.  By  the  senders  protocol,  the 
values  in  I  are  input  one  after  the  other,  the  order  of  input  time  corresponding  to  the  order  in  I.  One  of 
the  following  cases  must  thus  be  true: 

1.  There  are  values  Dk  and  D/,+ 1,  input  respectively  at  times  ti  and  <2,  ti  <  fa,  and  there  is  no  output 
of  Dk  at  a  time  fa,  fa  <  *4,  where  *4  is  the  output  time  of  Dk+\- 

2.  There  is  a  value  Dk  input  at  time  fi  and  output  at  two  different  times  fa  and  fa. 

Case  1:  By  the  protocol,  with  each  value  Dk,  a  label  /  is  associated  by  the  sender.  In  all  processors,  the 
data  item  sent  in  a  message  is  latest.value,  and  is  associated  with  latest./,  both  always  updated  in 
the  same  event,  so  it  is  never  the  case  that  the  label  /  associated  in  a  MSG(Z.  Dk)  with  a  value  Dk,  is  ever 
changed.  Let  us  thus  denote  the  label  assigned  by  the  sender  to  Dk  by  /*.  By  the  senders  protocol,  a 
MSG(/*,Z?fc)  must  have  been  sent  by  the  sender  and  REPLY(Zfc)  received  prior  to  the  input  of  Dk+i,  that  is, 
between  <1  and  *2-  By  property  Ql,  immediately  before  the  input  of  Dk  was  performed,  /*  did  not  exist 
in  S *.  Thus,  in  order  for  a  REPLY(Zfc)  to  have  been  received  by  the  sender,  it  must  be  that  the  receiver 
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performed  an  output  event  at  some  time  <3  <  t?.  Since  <2  <  t4  by  definition,  a  contradiction  to  the  first 
caoe  is  derived. 

Case  2:  There  is  only  one  edge  e  =  (u,r)  leading  to  the  receiver  processor,  on  which  a  MSG(ffc,  Dk)  could 
have  been  received.  By  the  protocol,  a  value  Dk  is  always  associated  with  one  label  (k,  and  as  long  as 
Dk  is  in  5‘,  this  label  is  in  l.  Thus,  by  Lemma  6.3  the  receiver  could  have  received  a  MSG (tk,Dk)  only 
once  on  e,  and  output  the  value  Dk  only  once.  □ 

The  following  is  the  proof  of  the  liveness  property  P2  of  the  main  protocol. 

Claim  0.5  There  exists  in  the  communication  graph  G  =  (V,  E),  a  path  p  =  s,vi,..,Vk,r  of  nodes,  each 
edge  of  which  is  eventually  connected  and  each  node  of  which  is  eventually  ->blocked. 

Proof:  Define  the  following  graph  G'.  The  set  of  nodes  includes  those  of  processors  v,  for  which  given 
any  time  t,  there  is  a  time  t'  >  t,  such  that  in  state  Si  the  value  of  blocked*  is  false.  The  set  of  edges 
includes  the  eventually  connected  edges  among  nodes.  By  Q4 ,  there  is  no  cut  between  the  receiver  and 
sender  in  G' .  It  is  known,  from  graph  theory,  that  if  is  there  is  no  cut  between  two  nodes,  then  there  is 
a  path  connecting  them.  □ 

Lemma  6.6  If  MSG(l,  value)  is  sent  on  edge  (t>«,  v^i)  in  p,  (as  in  Claim  6.5)  then  eventually  a  REPLY(l)  must 
be  received  on 

Proof:  Assume  by  way  of  contradiction  that  there  is  an  edge  for  which  the  lemma  does  not  hold.  Consider 
the  edge  (w,,  n»fi)  closest  to  the  receiver  for  which  it  does  not  hold.  Consider  the  last  unreplied  message 
MS G(l, value)  sent  on  (v,-,  f,+i).  Since  the  edge  (v,-,t7,+i)  is  eventually  connected,  there  exists  a  time  tx 
such  that  MS G(£,  value)  is  received  at  node  t/l+1.  If  at  state  ,  l  <  latest  then,  by  the  code  of 
the  main  protocol,  a  message  REPLY(f)  is  sent  from  u,+i  to  t>j,  and  eventually  will  be  received.  Therefore, 
at  state  ,  latest  J„1+1  <  l  and  rec_reply[f]  is  false. 

Since  the  Lemma  holds  for  edge  (v,+i,  Vj+2),  there  is  a  time  t2  such  that  at  state  the  value  of 
state(u,+1,  Vi+2)  is  clean. 

If  rec_reply[f]  is  true  in  a  REPLY(f)  is  sent  to  u,  and  we  are  done.  Therefore,  the  interesting 
case  is  when  rec_reply[f]  is  false  in  S‘*+1.  In  this  case  Vi+1  sends  MSG(f, value)  on  (v,+1, V{+2).  Since 
the  Lemma  holds  for  (t>j+i,  v,-+2),  there  exists  a  time  *3  such  that  a  REPLY(f)  is  received  from  v,+ 2,  and 
in  state  Si*+a,  rec_reply[f]  is  thus  true.  Therefore,  at  some  time  between  t2  and  t3,  a  REPLY(f)  was  sent 
to  Vi. 

Since  the  edge  (u;+l,Vj)  is  eventually  connected,  the  REPLY(f)  is  received  at  Vi,  contradicting  the 
assumption.  D 

Corollary  0.7  If  in  some  state  5‘.  the  8tatua[(vf, d*.})]  =  dirty  then  there  exists  a  state  5*',  t  <  t',  in 
which  status[(vj,  v»+i)]  =  clean.  □ 

Theorem  0.8  If  the  labeling  protocol  has  properties  Ql-4,  then  the  main  prc  j/  has  property  P2. 

Proof:  Consider  the  sender,  in  a  state  5*  in  which  trying  is  true.  By  Corollary  6.7,  there  is  a  time  ti  >  t, 
such  that  in  5J1 ,  statuses,  t>i)]  =  clean.  By  the  code  of  the  protocol,  the  sender  sends  MSG(f,  value)  to 
t>i.  By  Lemma  6.6,  there  is  a  time  t2  >  t\,  such  that  a  REPLY(f)  is  received.  □ 

7  Correctness  of  the  Label  Protocol 

The  following  is  the  proof  that  the  label  protocol  meets  properties  Ql-4- 
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Theorem  7.1  The  labeling  protocol  has  property  Q2. 

Proof:  Follows  directly  from  the  properties  of  the  sequential  time  stamp  system  of  size  N  =  1  +  (A  +  5 )n2, 
where  the  predicate  function  freeJabeLavailable(( )  is  just  |f|  <  N.  □ 

Definition  7.2  A  token  interval  is  the  interval  [to,ti]  from  the  time  to  in  which  the  TOKEN  was  sent,  till  the 
time  t\  in  which  a  T0KEN.ACK  was  received  for  it. 

Claim  7.3  In  any  channel  e  =  ( u,v )  from,  the  the  token  intervals  of  TOKENS  sent  in  a  given  direction,  are 
disjoint. 

Proof:  Follows  from  the  code,  since  a  REPLY  must  be  received  for  the  latest  sent  TOKEN,  before  the 
following  TOKEN  can  be  sent.  □ 

Definition  7.4  Let  UPDATE(i)  be  a  message  that  is  sent  from  v  to  u  before  time  t,  and  not  received  until 
time  t.  (There  is  at  most  one  such  message  UPDATE  at  any  time  t.)  Define  the  “dummy  variable”  UPDATE^[u] 
to  have  the  value  x,  if  such  a  message  UPDATE(x)  exists,  otherwise  UPDATEj,[u]  =  0  (this  dummy  variable  will 
also  be  used  in  the  complexity  proof). 

The  following  will  lead  to  the  proof  that  property  QS  is  met. 

Lemma  7.5  In  any  state  if  in  Si  tokene=  xl  and  unreport  ecU,[e]  =  x2,  and  in  S*u  estimate=  y,  and 
xl  +  x2  ±  y,  then  in  Cfu  vy  there  is  an  UPDATE  (xl  -  x2  -  y)  message  from  v  to  u. 

proof:  Proof  is  by  induction  on  the  sequence  of  all  events  by  t;  and  u.  Initially  xl  =  x2  =  y  —  0.  Assume 
the  claim  holds  in  state  5*1,  and  let  it  be  proven  for  any  following  state  Sta.  If  the  event  between  t\  and 
t2  was  an  increment  or  decrement  of  tokens,  there  was  also  a  corresponding  increment  or  decrement  of 
unreport  ed„[e],  and  the  claim  holds.  If  the  event  is  a  send  event  of  am  UPDATE  (x)  message,  there  is  a 
corresponding  assignment  of  unreported  [e]  to  0.  If  the  event  is  a  receive  event  of  an  UPDATE  (x)  message 
by  u,  there  is  a  corresponding  adding  of  x  to  estimate  (v).  In  all  other  events  there  is  no  change  of  any 
of  the  related  variables,  and  so  the  claim  holds.  D 

Lemma  7.0  In  any  state  5‘,  tokens  is  at  most  An  +  3n. 

Proof:  Assume  by  way  of  contradiction  that  the  claim  does  not  hold.  Consider  the  earliest  time  t  in 
which  a  process  v  in  state  Si,  had  tokens  >  An  +  3n.  Let  t'  be  the  latest  time  before  t,  in  which  v  in  5* 
had  tokens  =  An  and  following  which  tokens  >  An  in  any  S*" ,  t"  €  [P,  t],  (I.e.  the  maximal  interval 
that  ends  in  t  in  which  blocked=frue  in  v.) 

Since  3n  more  tokens  were  added  to  tokens  during  [t',t],  and  no  nee-token  was  generated  by  v, 
because  it  was  in  state  blocked,  there  exists  a  process  u,  from  which  v  received  at  least  3  tokens  on 
channel  e  =  ( u,v )  during  [tf ,  t].  Recall  that  by  Claim  7.3,  the  token  intervals  of  these  three  TOKENS,  are 
disjoint. 

The  main  argument  of  the  proof  is  that  in  the  state  before  u  sent  the  third  TOKEN  to  v  during  [t\t], 
estimat«u[v]  was  at  least  An.  Since  u  sent  the  third  TOKEN,  tokens,,  was  at  least  An  +  A.  Noting  that 
A  >  3n,  a  contradiction  to  the  fact  that  v  was  the  first  node  to  have  An  +  3n  tokens  is  reached.  The 
rest  of  the  proof  will  show  that  in  fact  at  the  time  the  third  TOKEN  was  sent,  estimateu[v]  >  An. 

By  Lemma  7.5,  at  time  t',  tokensv  =  unreport edje]  -(•  estimateu[v]  +  UPDATE(e).  Since  the  number 
of  tokens  in  v,  during  the  time  interval  [P,  t],  is  at  least  An,  the  value  of  unreported„[e],  till  the  next 
send  event  of  an  UPDATE  message,  is  always  at  least  unreport ed*'[e],  and  this  number  has  to  be  positive. 
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Consider  the  first  time  after  t',  where  unreport  adje]  =  0.  This  time  occurs  before  v  receives  the 
second  TOKEN  from  u.  At  time  t' ,  if  unreportedje]  >  0,  then  v  must  be  waiting  for  an  UPDATEJICK. 
Node  u  will  send  the  UPDATEJICK  before  it  sends  the  second  token  (since  it  receives  the  UPDATE  before 
the  token_ack).  Therefore,  v  will  send  an  UPDATE  before  the  second  TOKEN  is  received.  Just  after  the 
send  of  the  UPDATE,  the  value  of  unreportedje]  =  0.  Thus,  estimateu[u]  +  UPDATE(e)  >  An.  The 
UPDATE  must  be  received  at  node  u  before  the  receipt  of  TOKEN JICK  for  the  second  TOKEN.  This  implies 
that  estimateu[v]  at  this  time  is  at  least  An.  It  is  clear  that  estimateu[uj  will  remain  larger  than  An 
at  least  till  time  t,  implying  that  at  the  time  TOKEN  was  sent,  eatimateu[t>]  >  An.  This  contradicts  the 
assumption  that  such  a  time  t  exists.  □ 

Lemma  7.7  In  any  state  S£  of  the  network,  the  sum  of  the  number  of  tokens  in  each  node  (i.e.  tokens^,), 
plus  the  tokens  in  all  the  channels,  is  at  most  (A  +  3)n2. 

Proof:  By  Lemma  7.6  the  number  of  tokens  in  a  node  in  the  tokens  protocol  is  bounded  by  (A  +  3)n. 
We  claim  that  the  number  of  tokens  in  a  node  plus  the  number  of  tokens  on  the  incoming  edges  to  a 
node  in  bounded  by  (A  +  3)n.  This  follows  from  the  observation  that  the  adversary,  by  delivering  all  the 
tokens  on  the  incoming  edges  to  a  node,  can  make  the  number  of  tokens  in  a  node  equal  to  the  number 
of  tokens  it  had  plus  the  number  of  tokens  on  the  edges.  Therefore  the  overall  number  in  all  the  node  is 
(A  +  3)n2.  □ 

Lemma  7.8  In  any  state  S*,  the  number  of  entries  for  different  values  of  l  in  the  variables  edges  .sent  jnsg[f], 
edgesjrec_reply[l],  edges_recjnsg[f],  edges_sent_reply[f],  or  latest-f  =  l,  is  bounded  by  2 n. 

Proof:  In  any  process,  for  any  label  l ,  if  one  of  the  sets  or  variables  corresponding  to  l  is  not  empty,  there 
is  an  edge  e  in  edges_sent_msg[f]  which  is  not  in  edges_rec_reply[f],  or  an  edge  in  edges_recjnsg[f] 
which  is  not  in  edges_sent  jreply[f].  Since  in  this  case,  a  new  message  cannot  be  received  on  the  edges 
on  which  replies  were  not  sent,  and  new  messages  cannot  be  sent  on  edges  on  which  replies  have  not 
yet  been  received,  there  is  one  edge  at  least  corresponding  to  each  non-empty  entry,  and  the  edges  are 
different.  The  number  of  different  possible  entries  is  thus  bounded  by  2 (n  -  1),  twice  the  number  of 
incident  edges,  which  in  addition  to  the  one  additional  value  in  latest^  is  less  than  2n.  □ 

Theorem  7.9  The  labeling  protocol  has  property  QS. 

Proof:  It  will  suffice  to  prove  that  in  any  state  Si,  the  size  of  1  is  at  most  (A  +  5)n2.  By  Lemma  7.7  the 
number  of  token  in  the  node,  is  at  most  (A  +  3)n2.  By  Lemma  7.8,  each  node  has  at  most  2n  different 
l  €  t .  Therefore,  the  size  of  1  is  bounded  by  (A  -I-  5)n2.  □ 

Theorem  7.10  The  labeling  protocol  has  property  Ql.  □ 

Proof:  Let  it  be  shown  that  in  a  state  5‘,  if  l  exists  in  some  process  or  channel,  then  in  l  €  1 . 
Initially  the  claim  holds.  Assume  inductively  that  the  claim  holds  in  any  state  prior  to  S*.  Since  by  the 
code,  no  process  apart  from  the  sender  ever  adds  a  variable  entry  or  message  of  a  non-existing  label  l 
without  priorly  receiving  a  message  containing  l,  it  will  suffice  to  prove  that  the  claim  will  hold  in  state 
S*1 ,  t  -v  <i  following  an  event  in  which  the  sender  deleted  l  from  1. 

Since  in  the  state  S*3,  where  ti  is  the  latest  time  in  which  l  was  not  in  t  (there  is  such  a  last  time 
since  initially  l  is  empty),  l  was  not  by  the  induction  hypothesis  in  any  entry  in  a  process  or  message  on 
a  channel  in  St3. 

Thus,  by  the  main  algorithm,  any  process  v  having  value  l  in  the  time  interval  [<21*1]*  must  have  re¬ 
ceived  a  MSG  (l,  value )  from  some  process  u  in  a  state  following  St3 .  The  edges  field  in  any  TOKEN  (u,  l,  edges ) 
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created  in  a  process  u,  contains  all  edges  to  processes  to  which  it  sent  MSG  ((,  value).  It  follows  that  the 
condition  V(u  -*  w)  €  used_edges[f],  w  e  clean_nod«s[f]  holds  at  time  t j,  only  after  a  token  has  been 
received  from  every  node  that  received  a  MSG(f,  value)  a  some  time  in  the  interval  [<2,  tj]. 

Since  a  TOKEN  ( u,l ,  edges)  can  be  created  in  u  only  after  all  messages  sent  by  it  have  been  replied,  all 
its  outgoing  channels  do  not  contain  messages  with  label  t.  Recall  that  the  creation  event  of  the  token 
removes  all  entries  of  l.  By  Lemma  6.3  the  message  MSG  (f,  value)  is  sent  on  each  edge  at  most  once  in 
the  time  interval  [t2,*i]»  therefore  each  node  v  generates  T0KEN(f,  •,•)  at  most  once  in  the  time  interval 
[t2,ti].  If  V(u,  w)  6  used_edges[f],  u,  w  e  clean_nodes[f]  holds  at  time  then  l  does  not  exists  is  any 
process  or  channel.  Therefore,  the  sender  can  delete  l  from  t.  □ 

The  proof  of  the  following  Theorem  7.11  depends  on  the  proof  that  the  message  complexity  of  the 
tokens  protocol  is  bounded. 

Theorem  7.11  The  labeling  protocol  has  property  Q4- 

Proof:  Assume  by  way  of  contradiction  that  there  is  such  a  time  <o,  in  which  the  set  of  nodes  for  which 
blocked  is  true  in  every  state  S*,  t  >  to,  forms  a  cut  between  the  receiver  and  sender.  By  the  code  of 
the  main  protocol,  the  nodes  with  blocked  =  true  do  not  send  MSG  and  TOKEN-ACK  messages.  In  any 
state,  either  trying=  true,  or  since  by  Q3  free JabeLavailable  (l)  holds,  trying  will  become  true.  Since 
the  nodes  with  blocked  =  true  form  a  cut  between  the  receiver  to  the  sender,  from  the  first  time  in 
which  trying  becomes  true  after  to,  it  will  remain  true  forever,  and  eventually  the  sender  will  not  send 
any  new  MSG  messages. 

Assume  that  no  nes.tokan^  is  generated  by  any  node  v.  By  Theorem  8.21  the  message  complexity 
is  bounded  as  a  function  of  n,  therefore  eventually  there  exists  a  time  in  which  no  more  TOKENS  are  sent. 
Let  this  time  be  t\. 

In  any  state  of  the  sender,  tokens,  =  0  always.  Any  neighbor  v  of  s,  has  estimators]  =  0.  By  the 
assumption,  after  time  tj,  node  v  does  not  send  any  more  tokens.  iFrom  the  code  of  the  label  protocol, 
this  implies  that  either  tokens,,  -  estimators]  <  A  or  wait_token_ack(v,  s)  is  true.  Since  the  edge 
(v,s)  is  eventually  connected,  eventually  a  token.ack  will  be  returned  to  u,  and  vait_token_ack(v,  s) 
will  become  false.  Therefore,  tokens,,  -  estimate,,^]  <  A.  Since  estimators]  =  0,  tokens,,  <  A. 

By  induction  on  the  distance  from  the  sender  (in  the  eventually  connected  network),  the  above 
observation  can  be  extended  to  show  that  a  node  that  has  an  eventually  connected  path  of  length  i  to 
the  sender,  has  tokens  <  iA,  after  time  t\ .  Any  node  that  that  is  not  in  the  cut,  and  has  an  eventually 
connected  path  to  the  sender,  that  does  not  pass  through  the  cut,  has  a  path  of  distance  at  most  n  —  2 
(since  the  receiver  is  not  in  the  cut). 

The  nodes  that  have  an  eventually  connected  edge  to  a  node  that  is  in  the  cut,  by  the  previous  claim 
have  tokens  <  (n  -  2)A.  Therefore,  either  new  tokens  are  generated,  or  one  of  the  nodes  in  the  cut  is  in 
a  state  in  which  ->blocked. 

To  complete  the  proof,  it  is  sufficient  to  show  that  the  number  of  times  a  token  can  be  generated 
locally,  while  no  token  is  received  by  the  sender,  is  bounded  by  n3.  This  will  imply  that  one  of  the  nodes 
in  the  cut  changes  its  state  from  blocked  to  -iblocked,  contradicting  the  assumption  about  the  nodes 
in  the  cut.  The  following  paragraph  is  devoted  to  shows  that  the  number  of  tokens  generated,  while  no 
token  is  received  by  the  sender,  is  bounded  by  n3. 

The  number  of  tokens  that  can  be  created  by  a  single  MSG  of  label  l  is  at  most  n  (one  per  process). 
Since  the  sender  is  in  a  trying  state,  it  will  not  add  any  MSG  with  new  label  values  in  any  state  following 
time  to-  Since  the  number  of  MSG  messages  in  the  channels  in  state  Sto  is  at  most  n2,  the  number  of  new 
tokens  added  is  bounded  by  n3.  a 
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8  The  Complexity 

The  proof  that  the  space,  time  and  computation  time  of  the  protocol  are  polynomial  follow  immediately 
from  the  code  of  protocol,  given  that  the  communication  complexity  is  polynomial.  In  this  section  we 
therefore  prove  that  the  communication  complexity  of  the  end-to-end  protocol  is  polynomial. 

To  assist  in  the  analysis,  let  the  following  two  functions  be  introduced.  The  first  is  an  energy  function, 
£,  and  the  second  is  a  potential  function  <j>.  It  will  be  shown  that  each  TOKEN  message  received,  reduces 
the  sum  of  £  +  28 4>  by  n,  and  that  the  sum  £  +  28 4>  is  monotonically  non-increasing  in  time,  as  long  as 
no  new  token  is  generated,  and  no  TOKEN  is  received  by  the  sender.  Since  the  sum  is  bounded  from  above 
and  below  by  a  polynomial  in  n,  the  number  of  TOKENS  sent  is  polynomial.  The  number  of  UPDATES  is 
bounded  by  2 n  times  the  number  of  TOKENS  sent.  Since  by  the  argument  used  in  the  proof  of  Theorem 
7.11,  the  number  of  times  a  new  token  can  be  generated  is  at  most  n3,  the  entire  scheme  has  a  polynomial 
message  complexity. 

Definition  8.1  For  a  node  v  at  time  £,  let  intransit „  be  all  the  TOKEN  messages  sent  by  v  before  time  t, 
and  not  receive  by  time  t.  Let  p[  =  tokens*  +  intransitj,. 

Definition  8.2 

£t  =  (H(m‘)2)  +  28(  | unreported* [u]|  +  |UPDATE[,[u]|) 

(u,v)eE 


Claim  8.3 


0  <  £l  <  30(A  +  3  +  n)2n3  =  0(A2n 3) 

Proof:  The  proof  follows  from  th<>  fact  that  each  expression  is  non-negative,  and  by  Lemma  7.6,  is 
bounded  from  above  by  ^A  +  3  +  n)n.  □ 

Definition  8.4  An  unreported  interval  in  a  node  v  is  a  maximal  time  interval  [to*  £j].  where  t0  is  the  latest 
time  prior  to  t\,  in  which  unreport ed^e]  ^  0. 

Definition  8.5  An  update  interval  of  an  UPDATE  message  sent  from  u  to  v  at  time  t\  and  received  at  time 
£2,  is  the  time  interval  [to,  <2].  where  [to,ti]  is  an  unreported  interval. 

Claim  8.8  On  a  channel  from  u  to  v,  an  update  interval  intersects  at  most  seven  (7)  token  intervals. 

Proof:  In  the  subinterval  [to,  ti]  of  the  update  interval  from  u  to  v,  unreport edu[e]  /  0.  If  during  this 
interval,  wait-update.acku  =  false,  then  a  TOKEN  could  not  have  been  sent  by  u.  If  wait-update-ack,,  = 
true,  then  an  UPDATE  message  must  be  in  transit  from  u  to  v,  or  an  UPDATE-ACK  message  is  in  transit  from 
v  to  u.  Since  the  receive  event  of  the  UPDATE  message,  includes  a  sending  of  the  UPDATE-ACK,  a  TOKEN 
sent  by  u  in  the  interval  [to,£i],  would  have  its  corresponding  TOKEN-ACK  received  by  u  in  the  interval 
[^,£2]-  A  TOKEN  sent  following  this  one  (i.e.  in  [fi,<2]),  must  by  Claim  7.3  be  sent  after  the  TOKEN-ACK  of 
the  first  was  received,  and  would  have  its  TOKEN-ACK  received  later  than  t?.  Since  before  the  TOKEN  sent 
by  u  during  [to,  £1],  at  most  one  TOKEN  could  have  been  sent  and  received  prior  to  to,  at  most  three  token 
intervals  overlapped  [£0,^2]  for  TOKENS  sent  from  u  to  v. 

In  the  subinterval  [£o,  £ij  of  the  update  interval  from  u  to  v,  unreported„[e]  ^  0.  If  during  this  interval, 
wait-update_acku  =  false ,  then  a  TOKEN-ACK  could  not  have  been  sent  by  u.  If  wait_update_acku  =  true, 
then  an  UPDATE  message  must  be  in  transit  from  u  to  v,  or  an  UPDATE-ACK  message  is  in  transit  from 
v  to  u.  Let  a  be  the  first  TOKEN  message  sent  from  v  to  u  at  time  £3  >  £<j.  Let  £4  be  the  time  that 
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the  TOKEN _ACK  for  a  was  received  at  v.  If  at  time  to  there  was  an  UPDATE  in  transit  between  u  and 
v,  it  must  have  been  received  at  v  by  time  ts  <  t4.  Since  the  receive  event  of  the  UPDATE  message, 
includes  a  sending  of  the  UPDATE _ACK,  a  message  UPDATE-ACK  must  have  been  sent  to  u  before  t4.  Let 
0  be  the  first  TOKEN  message  sent  from  v  to  ti  at  time  t&  >  t4  and  tg  the  time  at  which  it  was  received 
at  v.  Since  the  UPDATE-ACK  was  sent  before  ts,  it  will  be  received  at  u  at  time  t7  <  t6.  At  time  <7, 
wait_update_ack=  false.  Therefore,  at  time  tg,  when  0  is  received  at  u,  either  an  UPDATE  message  was 
sent  between  tg  and  t7,  or  wait_update_ack=  false.  In  the  latter  case  an  UPDATE  message  is  sent  to  v  at 
time  tg.  If  tg  is  the  time  at  which  the  T0KENJ1CK  for  0  was  received  at  v,  then  the  UPDATE  was  received 
before  t g.  Since  the  receive  event  of  the  UPDATE  message,  includes  a  sending  of  the  UPDATE-ACK,  a  message 
UPDATE_ACK  will  be  sent  to  u  before  tg.  Finally,  a  token  7  sent  after  tg  will  be  received  by  u  after  the 
UPDATE-ACK.  Since  before  the  TOKEN  a,  at  most  one  additional  TOKEN  could  have  been  sent  and  received 
prior  to  t0,  at  most  four  token  intervals  overlapped  [to,  <2]  for  TOKENS  sent  from  v  to  u.  Thus,  at  most 
seven  token  intervals  could  have  intersected  [to,  tj]  in  both  directions  combined.  □ 

By  the  same  arguments,  the  following  claim  is  also  true. 

Claim  8.7  On  a  channel  from  u  to  v,  an  unreported  interval  intersects  at  most  seven  (7)  token  intervals. 

We  define  a  potential  function  p.  The  main  purpose  of  this  potential  function  is  to  enable  to  amortize 
in  a  given  state,  over  events  that  will  happen  in  the  future. 

Definition  8.8  For  every  TOKEN  message  a,  define  an  potential  function  p(a),  and  let  it  change  in  the 
following  way: 

1.  At  the  time  when  a  is  sent,  t,  pl{a)  is  decremented  by  n. 

2.  For  an  update  interval  that  intersects  the  token  interval  of  a,  at  the  time  t  that  the  message  UPDATE(x) 
was  received,  p(a)  is  incremented  to  max{p(a)  +  |x|/7,2An}. 

3.  At  time  t  in  an  unreported  interval  (either  in  u  or  v)  that  intersects  the  token  interval  of  a,  such  that 
at  time  t,  either  at  u  or  v  a  token  was  sent  or  received  and  |unreported[e]|  was  reduced  by  one,  p\a) 
is  incremented  to  max{p(a)  +  1/7, 2An}. 

4.  Let  t  be  the  time  that  a  is  received,  and  t'  the  time  just  before  t.  Then  p‘(a)  is  set  to  p1'  —  1/14(A  - 

Au))  +  »• 

Claim  8.9  |p‘(a)|  <  O(An)  □ 

The  following  two  lemmi  provide  a  lower  bound  on  the  increase  in  p  with  respect  to  UPDATE  messages 
and  changes  in  unreported. 

Lemma  8.10  Let  a  be  a  TOKEN  message  sent  from  v  to  u.  Consider  an  unreported  interval  [to,ti]  in  v  (u 
resp.),  that  is  not  a  part  of  an  update  interval  and  intersects  the  token  interval  of  a.  Let  K  be  the  maximum 
value  of  lunreportedvte])  (lunreported^e]]  resp.)  in  this  interval.  Then  the  sum  of  the  increases  of  p(a)  in 
this  interval  is  at  least  K/7. 

Proof:  Consider  the  time  t'  such  that  |unreported„[e]|  =  K.  At  time  t\ ,  by  definition  unreportedv[e]  = 
0.  Between  t'  and  tj  at  least  K  times  there  was  a  decrease  in  | unreport ed.v[e]| .  Each  such  decrease  by 
definition  contributes  1/7.  O 
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Lemma  8.11  Let  a  be  a  TOKEN  message  sent  from  v  to  u.  Consider  an  update  interval  [f0,  *2]  in  v  (resp.  u) 
that  intersects  the  token  interval  of  a.  Let  t\  be  the  time  at  which  UPDATE  was  sent.  Let  K  be  the  maximum 
value  of  lunraportad^e]!  (|unreportadu[e]|  resp.)  in  Then  the  sum  of  the  increases  of  p(a )  in  this 

interval  is  at  least  K/7. 

Proof:  Consider  the  time  t'  such  that  |unreportadt,[e]|  =  K.  At  time  t\  the  value  of  unraportedje] 
was  x.  Between  t'  and  t\  at  least  \K\  -  |x|  times  there  was  a  decrease  in  |unreport9d„[e]|.  Each  such 
decrease  by  definition  contributes  1/7.  At  time  t\  the  message  UPDATE(x)  was  sent.  At  time  *3,  when  the 
message  was  received  at  u ,  p(a)  increased  by  |x|/7.  Thus,  the  sum  of  the  increases  is  K/7.  □ 

The  following  lemma  provides  a  lower  bound  on  the  total  increase  in  p  as  a  function  of  the  final 
difference  between  the  number  of  tokens  in  the  two  end  processors. 

Lemma  8.12  Let  to  be  the  time  that  TOKEN  a  was  sent  from  v  to  u,  and  ti  the  time  it  was  received.  The 
sum  of  all  increases  of  p(a)  (over  all  time)  is  at  least  [A  —  (/i*1  —  )  -  n]/14. 

Proof: 

Let  xj  be  the  value  of  the  UPDATE  that  crosses  a  from  u  to  v  and  x2  be  the  value  of  the  UPDATE  that 
crosses  a  from  v  to  u. 

Let  77  be  estimateju),  and  r0  the  value  of  tokens',® ,  and  vq  the  value  of  unreportedje],  By  the 
code,  To  —  t)  >  A. 

Let  T\  the  value  of  tokens'1,  v\  the  value  of  unr  sport  edje],  and  ft  the  value  of  intransitl1 .  This 
means  that  =  T\  +  ft.  By  Lemma  7.5,  the  value  of  T\  (i.e.  tokens' )  is  equal  to  r0  +  x2  +  (1/1  -  vo). 

Let  r2  the  value  of  tokens j  i/2  the  value  of  unraportedje],  and  ft  the  value  of  intransit^ .  This 
means  that  p^  =  r2  +  ft.  The  value  r2  (i.e.  tokensj  is  equal  to  q  +  xi  +  i/2. 

/41  -  =  [(n>  +  x2  +  vx  -  Vo)  +  ft]  -  [(??  +  xx  +  v2)  +  ft] 

Which  we  can  rewrite  as, 

(To  -V)~  (/#  -/*«)  =  [(*1  +  v2)  +  ft]  -  [(*2  +  vi  -  v0)  +  ft] 

Since  ft  <  n  -  1  and  ft  >  0,  |ft  -  ft|  <  n, 

(ro  ~V)~  (Mv  ~Hu)~n<  M  +  |*2l  +  N  +  l*i|  +  H 

Recall  that  A  <  ro  -  T),  hence, 

2(A  -  (/41  -  Pu  )  -  n)  <  max{|^o|,|x2|}  +  \v^\  +  |«i|  +  M 

Since  every  term  in  the  sum  is  identified  with  a  distinct  update  or  unreported  interval '.  By  Lemma 
8.11  and  Lemma  8.10,  each  term  contributes  to  p(a)  one  seventh  of  the  maximum  value.  This  implies 
that  the  lemma  follows.  □ 

Definition  8.13  Let  Tl  be  the  set  of  tokens,  a,  sent  on  edge  e,  such  that  either  t  is  in  the  token  interval  of 
a,  or  at  time  t,  there  is  an  update  or  an  unreported  interval  that  intersect  the  token  interval  of  a.  Denote  by 

T*  =  Ue€ ETl  Let, 

<f>1  =  S pt 

aeT' 


Claim  8.14  | <t>l\  =  0(An3) 
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Proof:  By  Claim  8.9,  |p‘(a)|  =  0(nA),  and  by  Claim  8.6,  the  size  of  T*  is  bounded  by  7n2.  □ 

Lemma  8.15  Let  a  be  a  TOKEN  message  for  which  the  token  interval  ends  before  time  t,  and  a  £  T*.  then 

for  any  t'  >  t,  pt{a)  =  /(a). 

Proof:  Since  a  was  already  received  at  time  t,  p(a)  can  not  decrease  after  time  t.  The  fact  that  it  can 
not  increase  after  time  t  is  immediate  from  the  definition  of  the  increases.  c 

The  following  lemma  shows  that  the  value  of  each  p(a )  becomes  eventually  non-negative,  and  therefore 
dropping  them  from  the  sum  in  <t>  can  not  increase  the  value  of  <f>. 

Lemma  8.10  Let  a  be  a  TOKEN  message  for  which  the  token  interval  ends  before  time  t,  and  a  £  T*,  then 

pt{a)  >  0. 

Proof:  By  Lemma  8.15  after  time  t,  p(a)  does  not  change.  The  function  p(a)  can  be  decreased  at  most 
twice.  When  a  is  sent,  p(a)  is  decreased  by  1/28 n.  The  value  of  />(<*)  is  decreased  at  the  receipt  by 
1/14(A-  n  -  (/4  -  /4))  -  n.  By  Lemma  8.12,  the  sum  of  the  increases  is  at  least  [A  -  (/41  -  /41  )  -  n]/ 14. 
Therefore,  p(a )  >0.  □ 

The  following  lemma  shows  that  the  sum  £  +  28^*  decreases  by  at  least  n,  after  each  receive  of  a 
TOKEN  message.  This  implies  that  the  number  of  such  messages  can  be  bounded  by  {£  -f  28</>f]/n. 

Lemma  8.17  Let  a  be  a  TOKEN  message  receive  from  v  to  u  at  time  t.  Let  t1,  t  <  t'  be  the  time  immediately 
after  the  receive  event  of  a.  Then 

(£*  +  28/  -  (/  +  28/)  >  n 

Proof:  At  the  receipt  of  a,  /4  is  decremented  by  one,  and  /4  is  incremented  by  one.  This  implies  that 
(/4)2  4-  (/4)2  is  changed  to  (p[  -  l)2  +  (/4  4- 1)2,  hence  the  difference  is  2/4  -  2/4  -  2.  Since  the  increases 
in  unreported  is  bounded  by  n,  £*  -  £*'  >  2/4  -  2/4  —  2  —  28n. 

The  value  of  <f>(  is  changed  to  4?  -  1/14(A  -  n  -  —  A»L))  +  n ■  Therefore  28<^‘  -  28</>‘  =  2A  -  2 n  - 

2 (pi  -  pi)  -  28 n.  Hence,  the  sum  of  the  two  is  (£*  +  28 <f>1)  -  {£*'  +  28d>‘  )  >  2A  -  58 n  -2  >  n.  For 
A  >  30n,  this  holds.  □ 

The  following  lemma  establishes  the  invariant  that  £*  +  28#‘  is  monotonically  non  increasing  in  time. 
This  fact,  combined  with  Lemma  8.17,  will  establish  the  polynomial  convergence  of  the  algorithm. 

Lemma  8.18  Let  S‘°  and  5*1  be  two  states,  such  that  to  <  ti,  then  £to  +  28<£to  >  £tl  +  28 <j>tl . 

Proof:  The  only  events  that  can  affect  the  value  of  £  +  28<£,  are  the  sending  of  a  TOKEN  message,  the 
receipt  of  a  TOKEN  message,  the  sending  of  an  UPDATE  message,  or  the  receipt  of  an  UPDATE  message. 
When  a  receive  token  event  occurs,  by  Lemma  8.17,  the  sum  £  +  28<t>  is  decremented  by  n. 

When  a  TOKEN  message,  a,  is  sent  from  r,  the  value  of  pv  does  not  change.  The  unreported  variables 
that  are  updated,  contribute  at  most  28n  to  £.  Since  p(a)  is  decremented  by  n,  <p  is  decremented  by  rc, 
and  therefore  the  sum  £  4-  28d>  can  only  decrease  in  this  case. 

When  an  UPDATE(x)  message  is  sent,  the  value  of  £  does  not  change.  The  increase  in  |UPDATE|  is  equal 
to  the  decrease  in  | unreported).  The  value  of  <f>  clearly  remains  unchanged. 

When  an  UPDATE(x)  message  is  received,  the  value  of  £  is  decremented  by  28|xj.  By  Lemma  8.6, 
an  update  interval  can  intersect  at  most  seven  token  intervals.  Each  of  them  will  increase  4>  by  |r|/7. 
Therefore  the  increase  in  <t>  is  bounded  by  |x|.  D 

Lemma  8.19  As  long  as  no  new  token  is  generated  locally  and  no  TOKEN  is  received  at  the  sender,  the 
number  of  TOKEN  messages  sent  in  the  labeling  protocol,  is  bounded  by  0(A2n2),  and  the  number  of  UPDATE 
messages  is  bounded  by  0(A2n3). 
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Proof:  At  any  time  | S*  +  28d>‘|  =  0(A2n3),  by  Claims  8.3  and  8.14.  By  Lemma  8.18  and  8.17,  the 
maximum  number  number  of  tokens  transmitted  is  bounded  by  0(A2n2).  The  number  of  UPDATE  messages 
is  bounded  by  <9(A2n3).  □ 

In  order  to  show  that  the  algorithm  is  polynomial  it  would  have  been  sufficient  to  multiply  the  above 
complexity  by  n3,  the  number  of  tokens  that  can  be  generated,  without  any  TOKEN  returned  to  the 
sender.  In  the  rest  of  the  complexity  analysis  we  show  how  to  analyze  the  influence  of  the  tokens  that  are 
generated,  without  a  great  penalty  in  the  message  complexity.  The  main  idea  is  to  analyze  the  influence 
of  creating  a  TOKEN  on  the  energy  function. 

Lemma  8.20  If  k  new  TOKENS  are  generated  and  no  TOKEN  is  received  at  the  sender,  the  number  of  TOKEN 
messages  sent  in  the  labeling  protocol,  is  bounded  by  0(A2n3  -f  kA2n),  and  the  number  of  UPDATE  messages 
is  bounded  by  0(A2n3  +  kA2n2). 

Proof:  Each  token  that  is  generated  increases  £  4-  28d>  by  at  most  A2n2.  The  total  increase  is  bounded 
by  kA2n2.  Therefore,  the  number  of  tokens  is  bounded  by  0(A2n2  -f  kA2n),  and  the  UPDATE  messages 
by  0(A2n3  +  kA2n2).  □ 

Theorem  8.21  The  communication  complexity  of  the  end-to-end  protocol  is  0(n9)  bits. 

Proof:  The  complexity  of  the  main  protocol  is  0(n4)  bits  per  data  item  transmitted.  Since  the  number 
of  TOKENS  generated  locally  by  the  labeling  algorithm  is  bounded  by  0(n3),  and  since  each  time  stamp 
label  l  is  of  0(n3)  bits,  and  each  Update  message  is  of  O(logn)  bits,  the  communication  complexity  of 
the  protocol  is  by  Lemma  8.20  0(n 9)  bits.  □ 
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